Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyArrayDescr methods like in np.dtype + some missing stuff in npyffi #261

Merged
merged 20 commits into from
Jan 23, 2022

Conversation

aldanor
Copy link
Contributor

@aldanor aldanor commented Jan 19, 2022

  • The npyffi stuff has been pulled out of pyo3-type-desc.
  • Methods in PyArrayDescr mostly replicate np.dtype object, plus a few methods replicating C macros (this will greatly simplify writing tests in pyo3-type-desc, plus makes PyArrayDescr object somewhat useful).

@aldanor
Copy link
Contributor Author

aldanor commented Jan 19, 2022

There's still a few TODOs and questions around that will need to be resolved, plus I wanted to add at least some trivial tests for each of the methods, and add a new() that replicates np.dtype() constructor.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 19, 2022

Another thing I don't like is how typenums are encoded right now - maybe clean it up a bit?

  • NPY_TYPES enum contains NPY_BOOL = 0, ... NPY_CHAR = 26, NPY_USERDEF = 256.
  • The NPY_TYPES enum is marked as u32, however in C code it's always a c_int
  • Technically, this enum does not cover all typenums, because userdef typenums may extend past the NPY_USERDEF (according to NPY_NUMUSERTYPES constant)
  • We could replicate PyTypeNum_IS* (e.g. is_unsigned(), is_integer()) etc on the typenum, so it would have at least some use in a public API of PyArrayDescr, but then it would need a separate type.

What could be done:

  • Remove NPY_USERDEF from NPY_TYPES, replace it with a const (or with a const inside the impl)
  • What should be the repr of NPY_TYPES? (is it c_int or c_uint?)
  • Add enum TypeNum { Native(NPY_TYPES), UserDef(c_int) } (or UserDef(u32)?)
  • Add all of the PyTypeNum_IS*() macros as methods on TypeNum, e.g. is_integer())
  • Add From<> conversions from int to TypeNum and vice-versa (leq than 26 => native, otherwise userdef; not exactly precise but should be good enough)

From the public API perspective, you could then do dtype.num().is_integer() which is nicer than messing with c_int and flags.

src/dtype.rs Outdated Show resolved Hide resolved
src/dtype.rs Outdated Show resolved Hide resolved
src/dtype.rs Outdated Show resolved Hide resolved
src/dtype.rs Show resolved Hide resolved
src/dtype.rs Show resolved Hide resolved
@adamreichold
Copy link
Member

The NPY_TYPES enum is marked as u32, however in C code it's always a c_int

I don't think that it is possible to reference the type alias in a repr directive which is probably why it has this representation in the first place.

so it would have at least some use in a public API of PyArrayDescr, but then it would need a separate type.

Personally, I would actually prefer it not becoming part of the API again if possible. Put differently, I see API surface as a liability that should be avoided if possible.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 19, 2022

@adamreichold Fixed all todos and a few things noted above, let me know if you think anything else is needed to change/add/remove.

(I still need to write tests for all of this; although it's all quite trivial it should all get triggered at least once somewhere in the test suite.)

As for fields - I've changed them to an iterator of unwrapped items in this commit - BUT, here's another idea: what if we change it to a simple

pub fn get_field(&self, name: &str) -> PyResult<(&PyArrayDescr, usize)>

And then we don't have maps, iterators and excessive unwrapping, it all lands onto the user and we keep it reasonably simple. Here, get_ prefix indicates that some non-trivial action will be performed and in this case it's also fallible (btw, to be consistent with other names here, I'd also suggest to rename get_type() to typeobj() or something since type is a keyword, since it doesn't do anything and just returns self->typeobj).

Given that there's already fn names(&self) -> Option<Vec<&str>>, this should be more than sufficient.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 just skim read, got a few quick thoughts. Thanks!

Would it be worth adding some tests for the new dtype methods? edit: I see you mentioned that above!

src/npyffi/types.rs Show resolved Hide resolved
src/npyffi/array.rs Show resolved Hide resolved
src/dtype.rs Show resolved Hide resolved
src/dtype.rs Show resolved Hide resolved
@davidhewitt
Copy link
Member

(I still need to write tests for all of this; although it's all quite trivial it should all get triggered at least once somewhere in the test suite.)

If there's interest, you might want to add a coverage job to CI. You can steal configuration from pyo3 :)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 21, 2022

If there's interest, you might want to add a coverage job to CI. You can steal configuration from pyo3 :)

Looks pretty nice, good point! (never had a chance to use ci coverage with rust in particular for some reason, but from a brief look at pyo3's codecov reports looks pretty cool). I'll try to open a separate pr for that tomorrow here.

@aldanor
Copy link
Contributor Author

aldanor commented Jan 22, 2022

Ok:

  • Added tests for all of the methods (on a scalar, subarray and a struct type)
  • Removed subdtype and is_unsized, they are redundant
  • Replaced fields with get_field
  • Fixed base and its return type (no more Option)
  • Added PyArrayDescr:::get_type() back as hidden/deprecated
  • (various cleanup etc)

@aldanor aldanor force-pushed the feature/npyffi-and-descr branch 3 times, most recently from e7e160e to 7c988d1 Compare January 22, 2022 03:45
@aldanor
Copy link
Contributor Author

aldanor commented Jan 22, 2022

Ok, clippy's finally happy. Added a changelog entry as well. So unless there's further comments this should be good to go.

Copy link
Member

@adamreichold adamreichold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, clippy's finally happy. Added a changelog entry as well. So unless there's further comments this should be good to go.

I think so too. Will leave it open until tomorrow if anybody else wants to have a look.

(It is a matter of personal preferences, but I do find the changelog entry a bit on the wordy end of the scale. This could undermine the changelog's purpose of quickly getting an overview of the changes. The argument does not seem strong enough to me to ask for changes, I just wanted to mention it for consideration.)

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, thank you!

src/dtype.rs Outdated Show resolved Hide resolved
src/dtype.rs Show resolved Hide resolved
src/npyffi/types.rs Show resolved Hide resolved
@aldanor
Copy link
Contributor Author

aldanor commented Jan 22, 2022

(It is a matter of personal preferences, but I do find the changelog entry a bit on the wordy end of the scale. This could undermine the changelog's purpose of quickly getting an overview of the changes. The argument does not seem strong enough to me to ask for changes, I just wanted to mention it for consideration.)

@adamreichold I've noticed that as well. I guess I'm just used to more lengthy changelogs like the one we keep in the hdf5 crate. For me personally, if I'm an active user of some crate, I would be more than happy if the devs shared all the new functionality and features that may be relevant to me as a user. As long as it's written in a structured form in human language and doesn't look like Arrow changelog.

In cases like the one here, I suggest to always commit as much information as possible to the changelog, and before the release to shorten it down if needed and prettify it (e.g. "10 new methods added to PyArrayDescr to match np.dtype API, see docs for details"). It's usually easier to do it this way than the other way around (hunting through other contributors' commits in the commit history to see if there's anything worth noting in the changelog before the release).

Looking at the current rust-numpy's changelog, it may look as if there has been near zero development from 0.12 to 0.15 (all notes are brief 'fixed bug', 'version bump'). And yet there's around a hundred commits and a substantial a lot of work. In fact, I'd even go as far as update the previous entries in the changelog to expand them slightly and make the changelog more self-explanatory (e.g. instead of 'added einsum' -> 'added einsum!() macro equivalent to calling np.einsum()`, etc).

But anyways, I'm more of a newcomer here so that's my personal 2 cents on the subject :)

@adamreichold adamreichold merged commit 54b14e0 into PyO3:main Jan 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants